Neural ADMIXTURE for rapid genomic clustering
نویسندگان
چکیده
Abstract Characterizing the genetic structure of large cohorts has become increasingly important as studies extend to massive, diverse biobanks. Popular methods decompose individual genomes into fractional cluster assignments with each representing a vector DNA variant frequencies. However, rapidly increasing biobank sizes, these have computationally intractable. Here we present Neural ADMIXTURE, neural network autoencoder that follows same modeling assumptions current standard algorithm, while reducing compute time by orders magnitude surpassing even fastest alternatives. One month continuous using ADMIXTURE can be reduced just hours ADMIXTURE. A multi-head approach allows offer further acceleration computing multiple numbers in single run. Furthermore, models stored, allowing assignment performed on new data linear without needing share training samples.
منابع مشابه
Biological Data Mining for Genomic Clustering Using Unsupervised Neural Learning
The paper aims at designing a scheme for automatic identification of a species from its genome sequence. A set of 64 three-tuple keywords is first generated using the four types of bases: A, T, C and G. These keywords are searched on N randomly sampled genome sequences, each of a given length (10,000 elements) and the frequency count for each of the 4 = 64 keywords is performed to obtain a DNA-...
متن کاملA Simple and Rapid Leaf Genomic DNA Extraction Method for Polymerase Chain Reaction Analysis
In plants, secondary metabolites and polysaccharides interfere with genomic isolation procedures and downstream reactions such as restriction enzyme analysis and gene amplification. The removal of such contaminants needs complicated and time-consuming protocols. In this study, a simple, rapid and efficient method for leaf DNA extraction was optimized. This method use small amount of plant mater...
متن کاملComplex Patterns of Genomic Admixture within Southern Africa
Within-population genetic diversity is greatest within Africa, while between-population genetic diversity is directly proportional to geographic distance. The most divergent contemporary human populations include the click-speaking forager peoples of southern Africa, broadly defined as Khoesan. Both intra- (Bantu expansion) and inter-continental migration (European-driven colonization) have res...
متن کاملWorldwide patterns of genomic variation and admixture in gray wolves.
The gray wolf (Canis lupus) is a widely distributed top predator and ancestor of the domestic dog. To address questions about wolf relationships to each other and dogs, we assembled and analyzed a data set of 34 canine genomes. The divergence between New and Old World wolves is the earliest branching event and is followed by the divergence of Old World wolves and dogs, confirming that the dog w...
متن کاملGenomic signal processing for DNA sequence clustering
Genomic signal processing (GSP) methods which convert DNA data to numerical values have recently been proposed, which would offer the opportunity of employing existing digital signal processing methods for genomic data. One of the most used methods for exploring data is cluster analysis which refers to the unsupervised classification of patterns in data. In this paper, we propose a novel approa...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Nature Computational Science
سال: 2023
ISSN: ['2662-8457']
DOI: https://doi.org/10.1038/s43588-023-00482-7